情感语音分析是一个持续的研究主题。在该领域的一个相对较新的问题是对声乐爆发的分析,即笑声或叹息等非语言发声。解决情感声音爆发分析的当前最新方法主要基于WAV2VEC2或Hubert功能。在本文中,我们研究了WAV2VEC后继数据2VEC与多任务学习管道的使用,以一次解决不同的分析问题。为了评估我们有效的多任务学习体系结构的性能,我们参与了2022 ACII情感声音爆发挑战,这表明我们的方法在三个不同的子任务中大大胜过基线。
translated by 谷歌翻译
反事实思维领域的解释机制是可解释人工智能(XAI)的广泛使用的范式,因为它们遵循一种自然的推理方式,即人类熟悉。但是,该领域的所有常见方法都是基于传达有关特征或特征的信息,这些信息对于AI的决定尤为重要。我们认为,为了充分理解决定,不仅需要有关相关功能的知识,而且对无关信息的意识也很大程度上有助于创建用户的AI系统心理模型。因此,我们介绍了一种解释AI系统的新方法。我们称之为另一个事实解释的方法是基于显示AI输入的无关特征的替代现实。通过这样做,用户直接看到输入数据的哪些特征可以随意更改而不会影响AI的决定。我们在广泛的用户研究中评估了我们的方法,表明它能够为参与者对AI的理解做出重大贡献。我们表明,与既定的反事实解释方法相比,改变的解释适合传达对AI推理不同方面的理解。
translated by 谷歌翻译
识别面部视频的连续情绪和动作单元(AU)强度需要对表达动态的空间和时间理解。现有作品主要依赖2D面的外观来提取这种动态。这项工作着重于基于参数3D面向形状模型的有希望的替代方案,该模型解散了不同的变异因素,包括表达诱导的形状变化。我们旨在了解与最先进的2D外观模型相比,在估计价值和AU强度方面表现性3D面部形状如何。我们基准了四个最近的3D面对准模型:Expnet,3DDFA-V2,DECA和EMOCA。在价值估计中,3D面模型的表达特征始终超过以前的作品,并在SEWA和AVEC 2019 CES CORPORA上的平均一致性相关性分别为.739和.574。我们还研究了BP4D和DISFA数据集的AU强度估计的3D面形状如何执行,并报告说3D脸部功能在AUS 4、6、10、12和25中与2D外观特征相当,但没有整个集合。 aus。为了理解这种差异,我们在价值和AUS之间进行了对应分析,该分析指出,准确的价值预测可能仅需要少数AU的知识。
translated by 谷歌翻译
自动识别面部和声音的明显情绪很难,部分原因是各种不确定性来源,包括输入数据和机器学习框架中使用的标签。本文介绍了一种不确定性感知的视听融合方法,该方法量化了对情绪预测的模态不确定性。为此,我们提出了一个新颖的融合框架,在该框架中,我们首先通过视听时间上下文向量学习潜在分布,然后限制单峰潜在分布的方差向量,以便它们表示每种模式的信息量,以提供W.R.T.情绪识别。特别是,我们对视听潜在分布的方差向量施加了校准和序数排名约束。当经过良好校准时,将模态不确定性得分表明它们的相应预测可能与地面真实标签有多大不同。排名良好的不确定性得分允许在模式中对不同框架进行顺序排名。为了共同施加这两种约束,我们提出了软马克斯分布匹配损失。在分类和回归设置中,我们将不确定性感知的融合模型与标准模型 - 静态融合基线进行了比较。我们对两个情绪识别语料库(AVEC 2019 CES和IEMOCAP)的评估表明,视听情绪识别可以从良好的和良好的潜在不确定性度量中受益匪浅。
translated by 谷歌翻译
求职面试通常是高风险的社交场所,需要专业和行为技巧才能令人满意。专业的工作面试培训师会根据公共标准提供有关显示行为的教育反馈。对于提高工作面试所需的行为技能,这种反馈可能会有所帮助。产生此类反馈的技术方法可能是工作面试培训的嬉戏且低调的起点。因此,我们通过基于生成的对抗网络(GAN)的方法扩展了交互式虚拟工作面试培训系统,该方法首先检测到行为弱点并随后产生个性化的反馈。为了评估生成的反馈的有用性,我们使用求职培训系统的模型进行了一项混合方法试点研究。总体研究结果表明,基于GAN的产生的行为反馈很有帮助。此外,参与者评估反馈将改善他们的工作面试绩效。
translated by 谷歌翻译
Stress has a great effect on people's lives that can not be understated. While it can be good, since it helps humans to adapt to new and different situations, it can also be harmful when not dealt with properly, leading to chronic stress. The objective of this paper is developing a stress monitoring solution, that can be used in real life, while being able to tackle this challenge in a positive way. The SMILE data set was provided to team Anxolotl, and all it was needed was to develop a robust model. We developed a supervised learning model for classification in Python, presenting the final result of 64.1% in accuracy and a f1-score of 54.96%. The resulting solution stood the robustness test, presenting low variation between runs, which was a major point for it's possible integration in the Anxolotl app in the future.
translated by 谷歌翻译
Recently, extensive studies on photonic reinforcement learning to accelerate the process of calculation by exploiting the physical nature of light have been conducted. Previous studies utilized quantum interference of photons to achieve collective decision-making without choice conflicts when solving the competitive multi-armed bandit problem, a fundamental example of reinforcement learning. However, the bandit problem deals with a static environment where the agent's action does not influence the reward probabilities. This study aims to extend the conventional approach to a more general multi-agent reinforcement learning targeting the grid world problem. Unlike the conventional approach, the proposed scheme deals with a dynamic environment where the reward changes because of agents' actions. A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm. This study proposes a novel learning algorithm, discontinuous bandit Q-learning, in view of a potential photonic implementation. Here, state-action pairs in the environment are regarded as slot machines in the context of the bandit problem and an updated amount of Q-value is regarded as the reward of the bandit problem. We perform numerical simulations to validate the effectiveness of the bandit algorithm. In addition, we propose a multi-agent architecture in which agents are indirectly connected through quantum interference of light and quantum principles ensure the conflict-free property of state-action pair selections among agents. We demonstrate that multi-agent reinforcement learning can be accelerated owing to conflict avoidance among multiple agents.
translated by 谷歌翻译
Code generation from text requires understanding the user's intent from a natural language description (NLD) and generating an executable program code snippet that satisfies this intent. While recent pretrained language models (PLMs) demonstrate remarkable performance for this task, these models fail when the given NLD is ambiguous due to the lack of enough specifications for generating a high-quality code snippet. In this work, we introduce a novel and more realistic setup for this task. We hypothesize that ambiguities in the specifications of an NLD are resolved by asking clarification questions (CQs). Therefore, we collect and introduce a new dataset named CodeClarQA containing NLD-Code pairs with created CQAs. We evaluate the performance of PLMs for code generation on our dataset. The empirical results support our hypothesis that clarifications result in more precise generated code, as shown by an improvement of 17.52 in BLEU, 12.72 in CodeBLEU, and 7.7\% in the exact match. Alongside this, our task and dataset introduce new challenges to the community, including when and what CQs should be asked.
translated by 谷歌翻译
Neural machine translation (NMT) has become the de-facto standard in real-world machine translation applications. However, NMT models can unpredictably produce severely pathological translations, known as hallucinations, that seriously undermine user trust. It becomes thus crucial to implement effective preventive strategies to guarantee their proper functioning. In this paper, we address the problem of hallucination detection in NMT by following a simple intuition: as hallucinations are detached from the source content, they exhibit encoder-decoder attention patterns that are statistically different from those of good quality translations. We frame this problem with an optimal transport formulation and propose a fully unsupervised, plug-in detector that can be used with any attention-based NMT model. Experimental results show that our detector not only outperforms all previous model-based detectors, but is also competitive with detectors that employ large models trained on millions of samples.
translated by 谷歌翻译
Learning-based image compression has improved to a level where it can outperform traditional image codecs such as HEVC and VVC in terms of coding performance. In addition to good compression performance, device interoperability is essential for a compression codec to be deployed, i.e., encoding and decoding on different CPUs or GPUs should be error-free and with negligible performance reduction. In this paper, we present a method to solve the device interoperability problem of a state-of-the-art image compression network. We implement quantization to entropy networks which output entropy parameters. We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly with minor performance deviation, of 0.3% BD-rate, from floating point model results.
translated by 谷歌翻译